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ABSTRACT 

Improving  the  performance  of  ensemble  filters  applied  to  models  with  many  state  variables  requires  reg¬ 
ularization  of  the  covariance  estimates  by  localizing  the  impact  of  observations  on  state  variables.  A  co- 
variance  localization  technique  based  on  modeling  of  the  sample  covariance  with  polynomial  functions  of  the 
diffusion  operator  (DL  method)  is  presented.  Performance  of  the  technique  is  compared  with  the  non- 
adaptive  (NAL)  and  adaptive  (AL)  ensemble  localization  schemes  in  the  framework  of  numerical  experi¬ 
ments  with  synthetic  covariance  matrices  in  a  realistically  inhomogeneous  setting.  It  is  shown  that  the  DL 
approach  is  comparable  in  accuracy  with  the  AL  method  when  the  ensemble  size  is  less  than  100.  With  larger 
ensembles,  the  accuracy  of  the  DL  approach  is  limited  by  the  local  homogeneity  assumption  underlying  the 
technique.  Computationally,  the  DL  method  is  comparable  with  the  NAL  technique  if  the  ratio  of  the  local 
decorrelation  scale  to  the  grid  step  is  not  too  large. 


1*  Introduction 

The  problem  of  estimating  the  background  error  sta¬ 
tistics  is  an  important  issue  in  the  ensemble  filtering  and 
hybrid  data  assimilation  algorithms  that  employ  en¬ 
sembles  for  error  analysis  and  propagation.  Increasing 
the  accuracy  in  estimating  the  background  error  statis¬ 
tics  remains  a  scientific  and  technical  challenge,  because 
the  (co)variance  estimates  have  to  be  drawn  from  a  rel¬ 
atively  small  number  of  samples  contaminated  by  the 
noise  of  diverse  origin. 

A  particular  type  of  background  error  covariance  (BEC) 
estimation  technique  employs  an  ensemble  of  assimila¬ 
tions  (e.g.,  Fisher  2003;  Berre  et  al.  2006)  to  assess  the 
covariance  structure  from  the  ensemble  average.  Because 
of  computational  limitations,  ensemble  size  rarely  ex¬ 
ceeds  100  members  in  practice,  thus  limiting  the  accuracy 
of  the  straightforward  averaging  approach  because  of 
the  significant  level  of  sampling  noise.  The  impact  of 
sampling  noise  on  the  accuracy  of  the  BEC  estimates  has 
been  addressed  by  Houtekamer  and  Mitchell  (1998)  and 
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Hamill  et  al.  (2001)  and  led  to  the  development  of  the 
filtering  techniques  based  on  the  Schur  product  of  the 
sample  correlations  with  the  heuristic  filters  (localization 
operators).  This  approach  tends  to  localize  covariances  in 
physical  space  and  suppresses  long-range  correlations, 
whose  accuracy  is  most  affected  by  the  sampling  noise 
(e.g.,  Houtekamer  and  Mitchell  2001;  Buehner  2005). 

In  the  last  decade,  the  localization  techniques  have 
been  under  rapid  development  in  several  directions  with 
the  major  objective  to  relax  the  spatial  homogeneity 
assumption  underlying  the  original  scheme.  In  particu¬ 
lar,  Fisher  (2003),  Deckmyn  and  Berre  (2005),  and 
Pannekoucke  et  al.  (2007)  utilized  a  wavelet  approach  to 
account  for  inhomogeneities  in  the  covariance  structure; 
Wu  et  al.  (2002)  and  Purser  et  al.  (2003)  employed  re¬ 
cursive  filters  to  localize  the  covariances;  Weaver  and 
Courtier  (2001),  Pannekoucke  and  Massart  (2008),  and 
Weaver  and  Mirouze  (2012)  used  a  closely  related  dif¬ 
fusion  operator  approach;  and  Pannekoucke  (2009)  ex¬ 
plored  a  hybrid  scheme,  featuring  wavelet  technique  in 
combination  with  the  diffusion  method,  while  Anderson 
(2007)  employed  a  sampling  error  approach  to  derive 
localization  from  multiple  ensembles  in  the  framework 
of  the  hierarchical  ensemble  filter  technique.  In  the  oil 
and  gas  exploration  industry,  anisotropic  localization 
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functions  were  derived  by  combining  the  regions  of  sen¬ 
sitivity  of  the  well  data  with  prior  geological  models  (e.g., 
Emerick  and  Reynolds  2011;  Chen  and  Oliver  2010). 

Another  direction  in  the  localization  techniques  was 
pioneered  by  Bishop  and  Hodyss  (2007)  who  proposed 
to  augment  the  original  ensemble  by  including  Schur 
cross  products  of  the  spatially  smoothed  ensemble  mem¬ 
bers.  Further  development  of  this  approach  (Bishop  and 
Hodyss  2009a, b;  Bishop  et  al.  2011;  Bishop  and  Hodyss 
2011)  demonstrated  its  flexibility  in  adapting  the  co- 
variances  to  the  4D  background  flow  structures,  especially 
in  the  case  of  strongly  inhomogeneous  statistics.  A  certain 
disadvantage  of  the  adaptive  localization  (AL)  technique 
is  a  relatively  high  computational  cost,  associated  with  the 
necessity  to  operate  with  the  expanded  ensemble.  A  good 
review  of  the  filtering/localization  techniques  was  recently 
given  by  Berre  and  Desroziers  (2010). 

In  this  study  we  employ  the  numerical  experimenta¬ 
tion  approach  of  Weaver  and  Mirouze  (2012)  who  tested 
various  approximations  of  the  ensemble-generated  co- 
variance  matrix  by  the  exponent  of  the  diffusion  oper¬ 
ator  in  an  idealized  configuration.  The  presented  work 
considers  four  localization  techniques  applied  to  three 
different  covariance  models  in  a  realistically  inhomoge¬ 
neous  2D  setting.  Our  major  focus  is  on  comparing  non- 
adaptive  and  adaptive  localization  methods  with  the 
techniques  based  on  modeling  sample  covariance  by 
polynomial  functions  of  the  diffusion  operator.  To  make 
the  comparison,  we  construct  inhomogeneous  covariance 
matrices  B,  generate  the  respective  ensembles,  and  re¬ 
trieve  B  from  a  limited  number  of  ensemble  members 
by  the  means  of  considered  localization  techniques.  In 
the  next  section  the  four  localization  methods  used  are 
briefly  overviewed.  Methodology  of  the  numerical  ex¬ 
periments  is  described  in  section  3.  In  section  4,  the  lo¬ 
calization  methods  are  compared  in  terms  of  accuracy 
in  approximating  B  for  various  ensemble  sizes  and  their 
computational  efficiency.  The  results  are  summarized 
and  discussed  in  section  5. 

2.  Methods  of  covariance  localization 

a.  Traditional  scheme 

Given  an  ensemble  {x*}/\/A^  -  1  €  of  K  normal¬ 
ized  error  perturbations  about  the  ensemble  mean  listed 
as  columns  of  the  K  X  N  matrix  X,  their  sample  co- 
variance  B  is  estimated  by 

B^cov{xJ  =  XXt.  (1) 

In  practice,  the  dimension  of  the  model  state  N  is  much 
larger  than  K ,  and  the  sample  estimate  (1)  always 


contains  spurious  correlations  at  large  distances.  To 
increase  the  accuracy  in  approximation  of  the  BEC  ma¬ 
trix  B,  Houtekamer  and  Mitchell  (1998)  proposed  to  as¬ 
sign  zero  correlations  to  the  components  of  x  separated 
by  distances  larger  than  a  certain  prescribed  value  d  (lo¬ 
calization  scale).  Technically,  such  a  “localized”  co- 
variance  matrix  B^  is  obtained  as  the  elementwise  (Schur) 
product  °  of  the  raw  sample  covariance  B  and  the  locali¬ 
zation  matrix  W^,  whose  off-diagonal  elements  are  set  to 
zero  if  the  distance  between  correlated  points  exceeds  d: 

B,  =  BoWd.  (2) 

This  method  simultaneously  suppresses  spurious  ensem¬ 
ble  correlations  located  far  from  the  diagonal  and  shrinks 
the  null  space  of  B,  whose  “raw”  dimension  N  -  K  +  1  is 
very  large,  and  thus  likely  inconsistent  with  the  rank  of  the 
true  BEC  matrix.  A  disadvantage  of  the  technique  is  that 
it  relies  on  a  heuristic  matrix  Wrf,  which  does  not  explicitly 
take  into  account  inhomogeneity  and  anisotropy  of  the 
background  flow  which  affects  the  BEC  evolution. 

b.  Adaptive  methods 

Recently,  Bishop  and  Hodyss  (2007,  2009a, b,  2011) 
developed  a  family  of  AL  schemes.  The  idea  is  to  compute 
W  as  the  sample  correlation  matrix  generated  by  Schur 
cross  products  x#  of  the  spatially  smoothed  (modulated) 
members  of  the  original  ensemble  (e.g.,  Bishop  and  Hodyss 
2009a,  2011): 

i/y =  (Sx^.) © (Sxy);  (3) 

where  S  is  a  suitably  chosen  smoothing  operator  while 
J  =  K(K  -I-  l)/2  is  the  size  of  the  modulated  ensemble. 
Assuming  that  the  columns  of  the  J  X  N  matrix  X  list 
perturbations  {x//}  of  the  modulated  ensemble  about 
their  mean  that  are  normalized  to  have  unit  variance 
and  divided  by  y/J  —  1,  the  adaptively  localized  BEC 
matrix  is 

B*  =  B«W*»Bo(XXT).  (4) 

To  further  increase  stability  and  computational  effi¬ 
ciency  of  the  AL  technique,  Bishop  and  Hodyss  (2011) 
supplemented  the  method  with  additional  multiplica¬ 
tion  by  W d: 

B*  =  B*W**Wrf.  (5) 

Recent  experiments  with  this  improved  AL  scheme 
have  shown  its  good  localization  properties  and  rea¬ 
sonable  numerical  performance  (Bishop  and  Hodyss  2011). 
A  certain  disadvantage  of  the  method  is  the  numerical  cost: 
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apart  from  the  necessity  to  smooth  ensemble  members, 
multiplication  by  Bf  requires  computing  a  convolution 
with  a  KJN  X  N  matrix,  whose  columns  are  x*<>X// °w„, 
where  are  the  columns  of  the  square  root  of  Wrf. 

c.  Modeling  sample  covariance 

Another  way  of  estimating  the  true  covariance  is  to 
create  its  full-rank  covariance  model  using  the  low-rank 
ensemble  approximation  (1).  In  recent  years  this  ap¬ 
proach,  fueled  by  the  developments  in  covariance  model¬ 
ing  with  the  diffusion  operator  (e.g.,  Weaver  and  Courtier 
2001;  Xu  2005;  Yaremchuk  and  Smith  2011;  Yaremchuk 
and  Sentchev  2012),  has  been  studied  by  many  authors 
(e.g.,  Belo  Pereira  and  Berre  2006;  Pannekoucke  and 
Massart  2008;  Pannekoucke  2009;  Sato  et  al.  2009; 
Weaver  and  Mirouze  2012). 

The  idea  of  the  approach  is  to  parameterize  the 
structure  of  the  true  BEC  matrix  by  the  diffusion  tensor 
field  DaP( Jt),  which  defines  the  positive-definite  diffu¬ 
sion  operator  D  =  —\aDa^\^ 

To  avoid  confusion  with  notations,  vectors  and  ma¬ 
trices  in  state  space  R^  are  denoted  by  the  boldface 
roman  and  boldface  san  serif  fonts,  respectively.  In  the 
2D  physical  space  R2  we  adopt  tensor  notation,  where 
vectors  and  matrices  are  boldface  and  italicized,  Greek 
indices  enumerate  coordinates,  take  the  values  1  and  2, 
and  summation  is  assumed  over  repeating  indices. 

The  operator  D  is  used  to  construct  the  B-approximating 
covariance  model  that  is  specified  by  a  positive  func¬ 
tion  F  of  D  in  order  to  meet  the  positive-definiteness 
property  of  B.  Furthermore,  for  computational  reasons 
it  is  desirable  that  F  could  be  computed  recursively  and 
at  the  same  time  it  should  invert  the  spectrum  of  D  (i.e., 
the  largest  eigenvalues  of  F{  D}  should  correspond  to  the 
smallest  eigenvalues  of  D).  The  latter  requirement  en¬ 
sures  the  smoothing  property  of  the  BEC  model,  which 
is  important  in  applications. 

Among  the  functions  satisfying  these  requirements 
are  the  exponent  and  its  nth-order  binomial  (spline) 
approximations: 


Fe{D}  =  exp(-°), 

(6) 

/  D\n 

£ 

o 

II 

£L! 

(7) 

The  functional  forms  in  (6)— (7)  are  used  to  define  the 
correlation  matrix  C,  which  can  be  easily  transformed 
into  B  by  the  renormalization  formula  B  =  VCV,  where 
V  =  diag(v),  and  v  e  R^  is  the  vector  of  rms  error  var¬ 
iances  (square  roots  of  the  diagonal  of  B).  The  elements 
u(x)  of  v  are  relatively  well  known  from  the  ensemble 
statistics  as  they  suffer  less  from  sampling  errors  than 


ensemble  estimates  of  the  correlations.  In  its  turn,  the 
correlation  matrix  C  can  be  obtained  from  F{  D}  by 
setting  its  diagonal  elements  to  unity: 

C  =  diag(f)-1/2F{D}diag(f)-1/2,  (8) 

if  a  good  approximation  to  the  diagonal  elements  f  of 
F{D}  is  available  (Purser  et  al.  2003;  Yaremchuk  and 
Carrier  2012). 

This  study  employs  functions  Fe  and  Fn  for  approxi¬ 
mating  the  BEC  matrix  by  selecting  Da^(x)  in  a  way  that 
the  matrix  B  =  VCV  given  by  (6)-(8)  fits  the  structure  of 
the  sample  covariance  (1)  for  small  distances  and  pro¬ 
duces  negligible  correlations  at  large  distances.  The 
latter  property  is  satisfied  by  the  functions  (6)-(7). 

A  standard  method  of  finding  D  for  the  functional 
forms  (6)— (7)  is  to  use  analytic  relationships  between  the 
derivatives  of  F{D}  in  the  vicinity  of  the  diagonal  (i.e.,  at 
small  separations  between  correlated  points)  and  the 
diffusion  tensor  (e.g.,  Belo  Pereira  and  Berre  2006;  Sato 
et  al.  2009;  Weaver  and  Mirouze  2012).  These  relation¬ 
ships  are  derived  under  the  assumption  that  local  de- 
correlation  scales  are  much  smaller  than  the  typical  scale 
of  spatial  variability  of  D.  In  that  case,  the  correlation 
matrix  elements  C(x,  y )  are  locally  homogeneous  (LH); 
that  is,  they  depend  only  on  the  relative  position  r  =  x  —  y 
of  the  correlated  points  x ,  y ,  and  can  be  written  down 
explicitly  (e.g.,  Yaremchuk  and  Smith  2011): 


Cc(r)=exp^-0, 

(9) 

" '  ’  2n~2(n  -  2)! 

(10) 

where 

(u) 

is  the  squared  distance  measured  in  terms  of  the  local 
decorrelation  scales  defined  by  the  eigenvalues  of  D  and 
K  is  the  Bessel  function  of  the  second  kind.  Dependence 
of  the  correlation  matrix  elements  on  the  distance  r  from 
the  diagonal  is  shown  in  Fig.  1. 

Direct  differentiation  of  (9)-(10)  at  zero  distance 
(r  =  0),  yields  the  following  relationships,  useful  for 
estimation  of  the  diffusion  tensor  for  the  models  (9)-(10), 
respectively: 

D-J(x)  =  -[Va\f}Cf],  (12) 

^W  =  -^[V„V^CJ.  (13) 
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Fig.  1.  Correlation  functions  of  the  Gaussian  and  second-order 
spline  models  described  by  (9)-{10). 


Here  square  brackets  denote  extracting  the  diagonal 
values  from  a  matrix.  This  approach  requires  C  to  be 
twice  differentiable  at  the  diagonal,  which  is  not  the  case 
for  spline  models  with  n  <  3.  Expressions  (12)— (13)  were 
obtained  in  the  2D  Cartesian  coordinates  by  Weaver 
and  Mirouze  (2012).  Similar  relationships  hold  for  an 
arbitrary  correlation  model  satisfying  the  conditions  of 
local  homogeneity  and  appropriate  differentiability  of 
the  correlation  function  at  r  —  0  (appendix  A). 

Taking  into  account  the  commutativity  of  the  ensem¬ 
ble  averaging  and  (  )  differentiation  operators  renders 
the  rhs  of  (12)— (13)  in  the  form  involving  correlations 
of  the  first  derivatives  of  the  ensemble  members  (see 
Belo  Pereira  and  Berre  2006;  Weaver  and  Mirouze  2012; 
appendix  B): 


.  „  J<VWV>>-<V)*<V> 


(14) 


This  expression  together  with  relationships  (12)— (13)  is 
more  convenient  for  numerical  estimation  of  D  via 
sample  correlations  because  it  is  formulated  in  terms  of 
the  ensemble  perturbations  and  does  not  involve  second 
derivatives.  Weaver  and  Mirouze  (2012)  have  shown  re¬ 
cently  that  the  method  is  capable  of  delivering  rms  ac¬ 
curacies  of  20%-80%  in  reconstructing  D~l  in  idealized 
2D  setting.  The  approach  has  a  few  drawbacks.  First,  the 
gradient  computation  tends  to  amplify  sampling  noise 
in  the  estimate  of  D~l.  The  inversion  of  D”1  is  also  prone 
to  error  amplification.  For  these  reasons,  the  technique  is 
often  supplemented  by  additional  smoothing  (Raynaud 
et  al.  2009;  Berre  and  Desroziers  2010;  Weaver  and 


Mirouze  2012).  Second,  the  relationship  (14)  cannot  be 
applied  to  the  BEC  models  that  are  not  differentiable 
at  the  diagonal,  such  as  the  second-order  ( n  =  2)  spline 
model  (7)  in  3D,  which  is  characterized  by  the  expo¬ 
nential  correlation  function. 

An  alternative  approach  is  to  estimate  the  diffusion 
tensor  directly  by  minimizing  the  difference  between 
the  ensemble  estimate  of  the  correlations  in  the  vicinity 
of  the  diagonal  and  its  local  analytic  approximations 
(9)-(10).  This  approach  is  likely  to  be  more  robust,  as  it 
does  not  involve  differentiation  and  matrix  inversion 
and  can  be  formulated  as  a  least  squares  problem  in  the 
space  of  the  unknown  elements  of  D. 

In  the  following  sections  we  compare  efficiency  of  the 
four  localization  methods:  nonadaptive  (section  2a), 
adaptive  (section  2b),  and  the  two  described  above 
methods  of  retrieving  the  diffusion  tensor  from  the  en¬ 
semble  covariances.  For  brevity,  we  will  refer  to  the 
latter  two  methods  as  “differential”  and  “integral” 
diffusion  localization  (DL)  schemes. 

To  explore  the  efficiency,  we  adopt  the  following  ex¬ 
perimentation  strategy:  after  specifying  the  “true”  co- 
variance  matrices  B,  the  respective  ensembles  are 
generated  and  then  the  obtained  ensemble  members  are 
used  to  retrieve  the  approximate  structure  of  B  by  a 
given  localization  method. 


3.  Methodology 

a.  Experimental  setting 

Numerical  experiments  with  simulated  ensembles  were 
performed  as  follows.  First,  the  true  BEC  matrix  was 
specified  together  with  the  ensemble  by  selecting  a  vari¬ 
ance  distribution  v(x)  and  a  correlation  model  (6}-(7)  in 
a  real  oceanic  domain  shown  in  Fig.  2.  The  variance  dis¬ 
tribution  was  chosen  to  simulate  surface  temperature 
variations  in  the  northern  Gulf  of  Mexico  near  the  mouth 
of  Mississippi.  The  true  distribution  of  D  (Fig.  2)  was 
specified  to  mimic  the  background  error  dynamics  driven 
by  near-coastal  topographically  controlled  circulation. 
We  assumed  that  the  corresponding  background  currents 
followed  the  depth  contours  and  the  larger  eigenvector 
of  D  was  oriented  in  that  direction  and  was  proportional 
to  the  magnitude  of  the  local  bathymetry  gradient.  In  the 
regions  where  bottom  slope  was  less  than  20%  of  its  rms 
value  over  the  domain,  the  diffusion  was  set  to  be  iso¬ 
tropic  with  the  decorrelation  scale  of  15  km  (see  appen¬ 
dix  C  for  more  details). 

Two  BEC  models  used  in  the  experiments  were  the 
Gaussian  (6)  and  the  second-order  spline  model  (7).  The 
corresponding  true  correlation  matrices  C*  and  C2  were 
computed  explicitly:  first,  all  the  columns  of  F(D)  were 
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Fig.  2.  True  distribution  of  the  longer  principal  axis  of  the  diffusion 
tensor  (km).  Labeled  contours  show  depth  in  meters. 


computed  as  convolutions  of  the  operators  (9)-(10)  with 
the  8  functions  located  in  every  grid  point  of  the  domain. 
The  resulting  matrices  were  then  renormalized  by  their 
diagonal  elements  using  (8),  and  the  true  BEC  matrices 
were  then  obtained  by 

B,=VC,V;  B2  =  VC2V.  (15) 

Sums  of  eight  columns  of  Ce  and  C2  are  shown  in  Fig.  3. 
The  maximum  anisotropy  is  observed  in  the  southeast 
comer  of  the  domain  characterized  by  the  steepest  to¬ 
pography.  The  total  number  of  matrix  elements  was 
46032  ~  2  X  107. 

The  simulated  ensembles  X,,  and  Xm  were  generated 
by 
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Fig.  3.  True  correlations  for  the  (a)  Ce  and  (b)  C2  models  ploiied 
for  eighi  different  points.  Locations  of  the  points  are  shown  by 
white  circles. 


X  =VC*/2R;  Xj-VC^R,  (16) 

where  R  is  the  K  X  N  matrix,  whose  columns  are  the 
random  vectors  with  N  =  4603  5-correlated  components 
evenly  distributed  with  unit  variance  and  the  square  root 
is  defined  by  C  =  C1/2(C1/2)T.  The  value  of  K  was  20  000. 

The  ensembles  X,.  and  Xm  were  then  used  to  estimate 
the  true  covariances  Be  and  B2  with  the  four  localization 
techniques  described  in  the  previous  section.  The  only 
exception  is  the  differential  method,  which  was  not  used 
with  the  spline  model  (7)  because  the  corresponding  cor¬ 
relation  function  (10)  is  not  differentiable  at  the  origin. 
In  all  the  experiments  the  localization  matrix  \Nd  was 
Gaussian  (9)  with  the  isotropic  diffusion  tensor  D  =  d2/, 
where  /  is  the  2  X  2  identity  matrix  and  d  is  a  tuning 
parameter  defined  in  the  next  section. 

Numerically,  the  action  of  /v{D}  on  a  state  vector  x 
was  approximated  by  the  recursive  scheme: 


.xp(-5),  *('-£)  1,  (17) 

which  can  be  interpreted  as  “time  integration”  of  the 
diffusion  equation  with  the  integration  period  defined 
by  the  maximum  eigenvalue  A  of  D/2  over  the  domain 
and  the  “time  step”  of  A  In.  Similarly,  F2{D}x  was  com¬ 
puted  by  iteratively  solving  the  system  of  equations, 

(l  +  °)2y  =  x,  (18) 

with  the  minimum  residual  algorithm  (Paige  et  al.  1995). 

Computing  the  action  of  the  operators  C*72  and  C\n> 
which  appear  in  the  relationships  in  (16)  requires  an 
algorithm  for  F{D}12,  which  was  obtained  by  halving 
the  number  of  time  steps  n  in  (17)  and  removing  the 
square  in  the  lhs  of  (18). 


February  2013 


YAREMCHUK  AND  NECHAEV 


853 


With  the  simulated  ensembles  in  (16)  at  hand,  the 
sample  covariance  matrices  B*  were  computed  via  (1) 
by  varying  the  number  of  samples  x*  randomly  picked 
from  these  ensembles.  Using  the  same  samples,  rms 
error  variance  fields  v(x)  and  the  correlation  matrices  C 
were  also  computed. 

Given  these  ensemble  statistics,  the  localized  esti¬ 
mates  of  the  true  covariance  matrix  were  computed  with 
four  localization  techniques  described  in  the  previous 
section  [(2),  (5),  and  (9)— (14)  for  the  DL  estimates]. 

Technically,  the  DL  estimates  were  obtained  by  fitting 
the  diffusion  tensor  field  to  the  structure  of  C  with  two 
techniques:  the  first  one  utilizes  the  approach  based  on 
differentiating  the  ensemble  members  [(12)— (14)],  whereas 
the  second  one  extracts  D( x)  from  sample  correlations  C 
by  minimization  of  the  cost  functions: 

J(x)  =  [  [C(*  -y)~  c (x,y)]2  dy  -  min  ,  (19) 

Ju>  D(x) 

where  C  is  given  by  (9}-{10)  and  cu  is  a  small  vicinity  of  x. 
Similar  approach  was  tested  in  a  less  general  formulation 
by  Pannekoucke  and  Massart  (2008)  for  the  2D  Gaussian 
correlations.  To  minimize  (19)  we  used  the  M1QN3  al¬ 
gorithm  of  Gilbert  and  Lemarechal  (1989)  that  reduced 
the  L2  norm  of  the  cost  function  gradient  by  three  orders  of 
magnitude  in  3-6  iterations. 

To  distinguish  between  the  two  DL  schemes,  the 
corresponding  estimates  will  be  labeled  by  the  super¬ 
scripts  '  and  °  for  the  differential  [(12)— (14)]  and  integral 
[(9)— (11),  (19)]  approaches,  respectively. 

After  the  diffusion  tensor  estimates  were  obtained  using 
either  the  first  or  the  second  method,  the  localized  estimates 
C'  and  C°  of  C  were  computed  using  (6)-(8).  Equation  (8) 
contains  the  diagonal  elements  of  F{D},  whose  direct 
computation  is  numerically  prohibitive  in  practice.  For  that 
reason,  approximate  formulas  were  used: 

f=(27r)_1F{rD}d,  (20) 

where  d  =  (detD)- 1 12  and  ye  =  0.33;  yj  =  0.28  for  the  Fe 
and  F2  models,  respectively  (Yaremchuk  and  Carrier 
2012). 

Performance  of  the  four  localization  techniques  was 
measured  in  terms  of  the  distance  between  the  ensemble- 
estimated  localized  covariances  B/,  B^,  B',,  B°  and  the 
true  covariance  B: 

P(B<’B)  =  V  |bT’  (21) 

where  |  |  denotes  the  Frobenius  norm.  Relative  dis¬ 
tances  between  the  respective  correlation  matrices 


were  measured  by  the  following  relationship  (Herdin 
et  al.  2005): 

f22) 


b.  Numerical  implementation 

In  addition  to  comparing  the  skills  of  the  localization 
methods,  their  computational  efficiencies  are  also  com¬ 
pared.  In  practical  applications,  B*  and  BJ  are  never 
computed  directly,  but  represented  in  the  “square  root” 
form  B*  =  B{/2(Bj/2)r  to  speed  up  computations.  By  vir¬ 
tue  of  the  “square  root  theorem”  (Bishop  et  al.  2011), 
and  B*1/2  are  the  KN  X  N  and  KJN  x  N  matrices, 
whose  columns  are  x*°wn  and  x*°Xj/°wn,  respectively 
(section  2b).  The  elements  of  localization  matrix  \Nd  were 
computed  explicitly  with  the  analytic  equation  (9).  At 
distances  exceeding  several  localization  scales  the  ele¬ 
ments  were  set  to  zero  to  avoid  senseless  multiplications 
by  the  tails  of  the  Gaussian  exponent.  In  the  numerical 
experiments  this  “cutoff’  distance  was  set  to  3d.  The 
nonzero  elements  of  the  columns  of  W  1  were  com¬ 
puted  by  reducing  y/2  times  the  localization  scale  in  (11). 

To  explore  the  impact  of  the  ensemble  size  on  accu¬ 
racy  of  the  localization  schemes,  experiments  were 
performed  with  five  ensemble  sizes:  k  =  4,  10,  50,  200, 
and  1000.  The  respective  modulated  ensembles  (section 
2b)  were  computed  in  a  different  manner  for  various  k. 
For  k-  4  and  10  both  double  and  triple  Schur  products 
of  the  raw  ensemble  members  were  used,  thus  creating 
74  =  (4  x  5)/2  +  (4  x  4  X  5)/2  =  50  and/10  =  (10  X  11)/ 
2  +  (10  x  10  x  ll)/2  =  605  members.  For  k  —  50  and  200 
only  the  double  products  were  used.  The  respective 
ensemble  sizes  were  1275  and  20  100.  With  k  =  1000 
only  20  000  randomly  selected  pairs  were  used  to  create 
{xy} .  The  smoothing  operator  S  [(3)]  was  also  isotropic 
Gaussian,  but  its  scale  ds  was  different  from  d.  Both 
d  and  ds  were  optimized  in  every  experiment  to  mini¬ 
mize  the  distance  (21)  from  the  true  covariance. 

The  DL  algorithms  had  additional  specific  features. 
Estimates  of  D '  obtained  from  (12)-(14)  were  first 
smoothed  with  the  scale  of  /  =  30  km,  then  symmetrized 
and  checked  for  the  positive  definiteness.  In  the  case  of 
a  negative  eigenvalue  (a  common  situation  for  k  =  4, 
10),  the  tensor  was  discarded.  The  resulting  gaps  were 
filled  with  horizontal  interpolation  and  smoothed  again 
with  the  same  scale. 

When  computing  D°,  the  lengths  of  principal  axes  and 
orientation  of  the  larger  axis  were  chosen  as  control 
parameters.  This  approach  eliminated  violation  of  posi¬ 
tive  definiteness  and  improved  stability  of  the  algorithm. 
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FlG.  4  Relative  errors  between  the  true  covariance  matrix 
(Gaussian  model)  and  its  ensemble  estimates  for  various  localiza¬ 
tion  techniques  as  a  function  of  the  ensemble  size  k.  Thick  dashed 
line  shows  the  error  of  the  nonlocalized  estimate  B  [(1)].  Thin 
dashed  line  is  the  error  of  the  variance  estimate.  Errors  of  the  NAL 
B t  (thin  line)  and  AL  B?  (thick  line)  methods  are  shown  in  gray. 
Solid  black  lines  correspond  to  the  differential  Bj  (thin  line)  and 
integral  B?  (thick  line)  DL  methods. 

The  fitting  domain  co  was  a  square  four  grid  steps  in  size. 
Tensor  parameters  were  smoothed  with  the  same  scale  as 
has  been  used  in  the  computations  of  D'. 

4.  Results 

a.  Skill  comparison 

Figure  4  compares  skills  [(21)]  of  the  four  localization 
techniques  for  the  Gaussian  covariance  model  as  a  func¬ 
tion  of  the  number  of  ensemble  members  k.  The  straight 
dashed  lines  provide  errors  for  the  raw  variance  and  co- 
variance  estimates  without  localization.  As  expected, 
both  p(B)  and  p([B])  closely  follow  the  law  1  l\fk  with  the 
variance  estimate  p([B])  being  approximately  20  times 
more  accurate  than  the  estimate  of  the  covariance. 

For  k  =  4,  the  difference  between  p(B*)  and  p( BJ) 
appears  negligible  because  of  the  extremely  large  sam¬ 
pling  errors,  which  cannot  be  reduced  by  updating  the 
ensemble  with  modulated  members.  In  the  “practical” 
range  of  10  <  k  <  500,  the  adaptive  scheme  delivers  a  2- 
3  times  better  estimate  than  the  nonadaptive  localiza¬ 
tion  (NAL)  technique,  but  this  advantage  disappears  at 
k  >  500  because  of  the  increase  of  raw  ensemble  skill. 
This  type  of  behavior  has  been  also  observed  in  the 
experiments  where  we  kept  both  localization  scale  d  and 
the  smoothing  scale  ds  constant  and  equal  to  100  km 
(i.e.,  did  not  optimize  their  values  for  a  given  k).  In  that 


case  the  error  curves  converged  at  slightly  larger  k  ~~ 
1200-1500. 

The  DL  schemes  demonstrate  a  significantly  better 
performance  at  k  <  20,  although  p(B9  is  20%-30% 
larger  than  p(B?)  starting  from  n  -  10.  Flattening  of  the 
curves  for  B^,B*  at  large  k  can  be  explained  by  two 
factors.  The  first  one  is  a  certain  inconsistency  of  the  true 
covariance  structure  with  the  LH  assumption  used  in  the 
derivation  of  (9)-(14):  Fig.  2  shows  that  the  typical  scale 
of  variability  of  the  diffusion  tensor’s  axes  is  compatible 
with  their  magnitude  throughout  the  domain,  and  in  some 
places  (e.g.,  steep  bottom  regions  in  the  southwest)  it 
is  even  smaller  than  the  local  decorrelation  scales.  The 
second  factor  is  associated  with  the  violation  of  the  LH 
assumption  in  computing  the  normalization  factors  with 
(20).  Although  (20)  is  capable  of  approximating  the  di¬ 
agonal  elements  at  the  error  level  of  5%-10%,  its  con¬ 
tribution  to  the  asymptotic  error  of  0.4  (Fig.  4)  is  not 
negligible.  Similar  observations  are  reported  in  the  ide¬ 
alized  experiments  of  Weaver  and  Mirouze  (2012). 

Figure  5  shows  the  absolute  difference  between  the 
eight  columns  of  C^,  C?  and  the  respective  columns  of 
the  true  correlation  matrix  for  the  Gaussian  model 
shown  in  Fig.  2a.  It  is  seen  that  the  difference  is  not  zero 
even  in  the  diagonal  points  (shown  by  black  circles) 
where  both  correlation  estimates  are  supposed  to  be 
equal  to  one  by  definition.  This  difference  can  be  vir¬ 
tually  embedded  as  an  additional  error  in  the  variance 
estimate  V,  which  is  primarily  defined  by  the  size  of  the 
ensemble.  In  the  reported  experiments  this  diagonal 
approximation  error  ranged  within  5%-8%,  and  started 
to  contribute  significantly  at  k  >  30  (i.e.,  when  the  var¬ 
iance  estimation  error  falls  below  10%;  lower  dashed 
line  in  Fig.  4).  The  impact  of  the  diagonal  approximation 
error  is  less  visible  when  comparing  covariance  matrices 
in  terms  of  (22),  which  is  more  sensitive  to  the  errors  in 
the  off-diagonal  elements  (Fig.  6). 

The  degree  of  inhomogeneity  of  the  true  covariance 
can,  in  principle,  be  assessed  from  asymmetry  of  the 
local  correlations  derived  from  the  ensemble  when  k  is 
large  enough  to  suppress  sampling  noise.  When  the  LH 
assumption  is  satisfied  with  high  accuracy,  the  correla¬ 
tion  matrix  elements  satisfy  (9)-(10),  and  therefore 
should  be  nearly  invariant  under  the  mirror  transfor¬ 
mations  r  —►  -r  in  the  vicinity  of  the  diagonal.  We 
checked  this  property  for  the  true  correlation  matrices 
and  found  relatively  high  degrees  of  asymmetry  (0.24  and 
0.28  for  C,  and  C2,  respectively).  In  combination  with 
5%-8%  diagonal  errors,  these  figures  may  explain  the 
asymptotic  error  level  in  approximating  the  true  co- 
variances  by  the  DL  schemes  (Fig.  4). 

Another  feature  observed  in  the  experiments,  is  a 
persistently  better  performance  of  the  DL  methods  at 
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FiG.  5.  Absolute  difference  between  eight  columns  of  the 
true  correlation  matrix  for  the  Gaussian  model  (Fig.  3a)  and  its 
DL  approximations  (a)  and  (b)  CJ  obtained  with  50  en¬ 
semble  members.  Filled  circles  show  locations  of  the  diagonal 
elements. 


small  ensemble  sizes  k  (Figs.  4  and  6).  One  may  assume 
that  this  property  could  be  attributed  to  the  fact  that 
the  DL  schemes  have  an  a  priori  advantage  because  the 
structure  of  the  true  covariances  is  already  embedded 
into  the  underlying  diffusion  models  used  for  approxi¬ 
mation.  To  check  this,  we  generated  an  alternative  true 
covariance  matrix  B„,  which  was  far  enough  from  both 
Br  and  B2  to  eliminate  this  advantage  (Fig.  7). 

To  do  this,  we  randomly  picked  1000  members  from 
each  of  the  ensembles  X,.  and  X2,  and  then  generated 
additional  20  000  members  using  the  adaptive  technique 
described  in  section  2b.  Pairs  for  Schur  cross  products 
were  composed  by  randomly  picking  members  from  the 
two  ensembles  and  never  from  one.  The  resulting 
22  000-member  ensemble  was  used  to  compute  B„  with 
(1).  After  that  the  columns  of  B„  were  additionally 


FIG.  6.  Relative  errors  pi  between  the  true  covariance  matrix 
(spline  model.  Fig.  2b)  and  its  ensemble  estimates  for  various 
localization  techniques  as  a  function  of  the  ensemble  size  k.  Thick 
dashed  line  shows  the  error  of  the  nonlocalized  estimate  B  [(1)). 
Thin  dashed  line  is  the  error  of  the  variance  estimate.  Errors  of 
the  NAL  B,  (thin  line)  and  AL  BJ  (thick  line)  methods  are  shown 
in  gray.  Solid  black  line  gives  the  error  of  the  integral  B^  DL 
method. 


smoothed  and  renormalized  to  have  the  same  variance  V 
as  the  original  models  Be  and  B2. 

Figure  8  demonstrates  that  in  the  case  of  B„  model 
the  approximation  errors  of  the  DL  schemes  are  still 
below  the  errors  of  the  AL  scheme  when  n  <  30-40. 
Furthermore,  the  DL  schemes  keep  being  competitive 
in  the  entire  range  of  the  practical  ensemble  sizes  (up  to 


Fig.  7.  Difference  between  the  300  largest  eigenvalues  of  B2  and 
B,  (gray  line)  and  of  Brt  and  B*.  (top  right)  Distances  between  the 
corresponding  matrices  are  shown. 
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Fig.  8.  As  in  Fig.  4,  but  for  the  true  covariance  Bn. 


n  —  150-200).  We  therefore  may  assume  that  better 
performance  at  small  ensemble  sizes  in  an  intrinsic 
property  of  the  DL  method,  which  could  possibly  be 
explained  by  its  enhanced  ability  to  better  capture  near- 
diagonal  structure  of  the  correlations.  However,  only 
experiments  with  real  assimilation  systems  can  confirm 
this  hypothesis. 

One  can  notice  a  relatively  weak  performance  of  the 
AL  scheme  (thick  gray  line  in  Fig.  8)  as  compared  to 
the  case  of  true  covariance  described  by  the  Be  model 
(Fig.  4).  Such  a  behavior  can  be  explained  by  the  fact 
that  the  smoothing  scale  ds  was  the  same  as  was  used 
for  generation  of  the  modulated  ensembles  in  the  ex¬ 
periments  with  Be.  In  general,  adjustment  of  the  locali¬ 
zation  scales  significantly  improved  the  approximation 
accuracy  of  B*  and  BJ,  especially  at  low  k  for  the  stan¬ 
dard  localization  scheme  whose  optimal  values  of  d(k) 
changed  in  a  wide  range  from  d( 4)  =  30  to  d(1000)  = 
500  km.  For  the  adaptive  scheme  variations  of  d  were 
significantly  smaller:  d( 4)  =  100  to  d( 4)  =  500  km. 

These  figures  shed  some  light  on  the  role  near-diagonal 
elements  play  in  the  overall  structure  of  the  considered 
covariance  matrices.  It  appears  that  accurate  estimation 
of  these  elements  eliminates  a  larger  portion  of  the  error 
in  approximation  of  the  true  covariance.  To  support  this 
idea,  we  computed  distances  between  the  three  consid¬ 
ered  covariances  Be ,  B2,  and  B„  and  their  approximations 
obtained  by  setting  to  zero  all  the  off-diagonal  elements, 
located  farther  than  a  certain  distance  r  (measured  in 
physical  space)  from  the  diagonal.  As  expected,  the  major 


portion  of  the  error  is  eliminated  when  elements  within 
the  mean  decorrelation  scale  are  accounted  for.  This 
feature  of  the  considered  covariances  partly  explains  the 
better  skill  of  the  DL  schemes  that  are  “more  focused”  on 
accurate  representation  of  the  near-diagonal  structure  of 
the  covariance  matrices.  In  addition,  DL  models  are  ca¬ 
pable  to  deliver  better  smoothness  away  from  the  di¬ 
agonal,  which  is  essential  for  elimination  the  imbalance 
problems  that  may  arise  when  prediction  models  are  used 
with  the  resulting  analysis  (e.g.,  Kepert  2011). 

b.  Computational  efficiency 

In  the  previous  section  we  have  shown  that  DL  schemes 
appear  to  be  competitive  in  accuracy  with  both  NAL  and 
AL  techniques  when  the  number  of  ensemble  members 
k  is  relatively  small.  When  k  >  70  -  100,  the  AL  scheme 
provides  better  accuracy  (Figs.  4-8),  but  the  DL  method 
may  still  remain  competitive  up  to  k  ~~  100.  On  the  other 
hand,  it  is  much  less  computationally  expensive,  because 
it  does  not  require  generation  of  the  costly  modulated 
ensemble. 

The  cost  of  localization  is  defined  by  the  multiplica¬ 
tion  of  the  square  root  of  the  localized  covariance  matrix 
by  a  state  vector.  In  the  case  of  the  NAL  scheme,  this 
product  involves  M  —  kNn d  multiplications,  where  nd  is 
the  number  of  nonzero  elements  in  the  column  of  W^2. 
For  the  AL  scheme  [(5)]  this  number  is  J  times  larger 
and  may  require  significant  computational  resources. 

The  cost  of  implementing  the  DL  schemes  consists  of 
two  components:  estimation  of  the  diffusion  tensor  and 
multiplication  by  the  square  root  of  the  localization 
matrix.  The  number  of  multiplications  required  to  com¬ 
pute  D'  at  a  grid  point  is  approximately  proportional 
to  9k ,  because  local  correlations  have  to  be  computed 
only  in  the  nearest  neighborhood  of  the  diagonal  and 
each  computation  involves  k  products  of  the  ensem¬ 
ble  members.  Differentiation,  inversion  [(12)— (13)],  and 
smoothing  adds  approximately  50  operations  for  a  grid 
point  thus  giving  the  estimate  of  M'  ~  (9k  +  50)Nfor  the 
overall  cost  of  computing  D '.  The  cost  of  multiplication 
by  the  square  root  of  B*t  is  proportional  to  Nn+m,  where 
n*  =  9  is  the  number  of  elements  in  the  (2D)  stencil  of 
O',  and  m  ~  102  is  the  number  of  either  “time  steps”  in 
case  of  Ce  or  the  number  of  iterations  in  solving  the  re¬ 
spective  linear  system  in  the  case  of  C2  localization 
model.  This  brings  the  estimate  of  the  total  number  of 
operations  to  M!  ~  9 (k  +  m  +  5)N . 

Computing  D°  is  somewhat  more  expensive  than  D' 
because  it  involves  solving  a  minimization  problem  at 
every  grid  point.  In  the  2D  case  considered,  estimation 
of  D°  required  approximately  25 (k  +  20 n0)  operations 
per  grid  point  where  n0  =  5  is  the  average  number  of 
iterations  required  for  convergence  of  the  minimization 
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routine  and  25  is  the  number  of  grid  points  occupied  by 
the  optimization  subdomain  co  [(19)]. 

Taking  the  typical  value  of  rij  =  49  for  the  number  of 
grid  points  in  the  localization  stencil,  the  following  es¬ 
timates  can  be  obtained: 

M  ~  50 kN, 

M*  ~  50kNJ, 

M'  ~  50kN[  0.2 + 

\  k  5k/ 

AC50*w(0.5+§+£). 

Assuming  that  k  >  1  and  taking  the  N AL  cost  M  =  50 kN 
as  a  benchmark,  the  following  estimates  of  the  (normal¬ 
ized  by  M)  localization  costs  M  can  be  obtained: 

M*=J\  A/' =  0.2(l+  j);  M°  =  0.2(2.5  +  jj  . 

(23) 

In  the  reported  experiments  the  typical  value  of  m  ranged 
between  120-180  for  the  Gaussian  model  and  150-300  for 
the  spline  model.  Thus,  for  the  ensemble  size  of  k  =  50 
both  DL  models  appear  to  be  computationally  competitive 
with  the  NAL  technique  (A?'  -  0.7  -  0.9,  M°e  -  1 .0  -  1.2). 
Similar  CPU  time  ratios  were  observed  in  the  reported 
experiments.  As  is  seen  from  (23)  the  computational 
advantage  of  the  DL  schemes  improves  with  the  growth 
of  the  ensemble  size  ky  although  their  accuracy  tends  to 
stagnate  (Figs.  4,  6,  and  8). 

5.  Conclusions 

Numerical  experiments  with  the  DL  schemes  in  a  re¬ 
alistically  inhomogeneous  2D  setting  have  shown  their 
competitiveness  with  the  NAL  and  AL  methods  in 
terms  of  accuracy  within  the  range  of  ensemble  sizes  k  ~ 
20-100  used  in  the  data  assimilation  practice.  For  larger 
ensemble  sizes  the  DL  method  does  not  give  any  error 
improvement  as  it  reaches  the  limits  imposed  by  the 
assumption  of  local  homogeneity. 

From  the  computational  point  of  view,  the  DL 
method  appears  to  be  compatible  with  the  NAL  tech¬ 
nique,  which  is  in  turn  less  expensive  than  the  adaptive 
algorithms  proposed  by  Bishop  and  Hodyss  (2007, 
2009a, b).  Conducted  experiments  also  indicate  that  the 
AL  method  is  significantly  more  accurate  than  NAL  in 
the  case  of  strongly  inhomogeneous  covariances  when 
the  ensemble  size  is  less  than  several  hundred. 

Comparison  of  the  differential  and  integral  DL  schemes 
have  shown  that  the  differential  method  is  20%-50%  less 


computationally  expensive,  although  it  appears  to  be 
somewhat  less  robust  and  accurate  when  applied  in  re¬ 
alistically  inhomogeneous  environment.  An  advantage  of 
the  integral  approach  is  that  it  can  be  utilized  with  cor¬ 
relation  models  that  are  not  differential  at  the  origin. 

It  should  be  also  noted  that  the  computational  efficiency 
of  the  DL  schemes  strongly  depends  on  the  number  of 
iterations  m  needed  to  compute  the  action  of  the  locali¬ 
zation  operator  on  a  state  vector.  This  number  is  con¬ 
trolled  by  the  ratio  of  the  local  decorrelation  scale  (length 
of  the  largest  principal  axis  of  D)  to  the  grid  step,  which 
never  exceeded  7  in  the  reported  experiments.  Therefore, 
the  DL  methods  may  lose  computational  efficiency  when 
the  model  is  capable  to  describe  motions  at  scales  well 
below  those  resolved  by  observations.  This  restriction 
can  be  bypassed  if  the  covariances  are  localized  on  a 
grid  compatible  with  the  decorrelation  scale,  a  technique 
suggested  by  Bishop  et  al.  (2011)  to  speed  up  the  locali¬ 
zation  algorithms. 

The  DL  algorithms  have  enough  room  for  further 
development  along  several  directions.  In  particular,  the 
degree  of  local  inhomogeneity  of  the  target  covariance 
could  possibly  be  assessed  by  monitoring  dependence  of 
spatial  asymmetry  of  the  local  correlations  on  the 
number  of  ensemble  members  used  for  their  evaluation. 
This  information  could  then  be  blended  in  the  cost 
function  (19)  to  prevent  overfitting  sample  correlations 
by  the  analytic  model.  Efficient  higher-order  approxi¬ 
mations  to  the  diagonal  elements  of  F{D}  could  also  be 
thought  out  to  improve  the  accuracy  in  estimating  the 
DL  correlation  matrix.  Finally,  the  overall  accuracy  of 
the  DL  covariance  estimates  could  also  be  improved 
through  their  renormalization  by  the  optimally  filtered 
(e.g.,  Raynaud  et  al.  2009;  Berre  and  Desroziers  2010) 
diagonal  elements  of  f  ®  v.  This  approach  can  simulta¬ 
neously  reduce  sampling  errors  in  the  variance  field  v 
estimates  and  errors  associated  with  the  LH  assumption 
in  computing  the  diagonal  elements  of  F{D}. 

One  should  also  keep  in  mind  that  ensembles  en¬ 
countered  in  large  geophysical  DA  problems  are  likely 
to  have  more  complicated  structure  than  the  simulated 
ensembles  described  by  (16).  In  particular,  real-life  en¬ 
sembles  are  often  biased  and  they  do  not  normally 
demonstrate  kTm  error  scaling  for  realistic  ensemble 
sizes.  On  the  other  hand,  the  “true”  covariance  matrices 
are  never  known  and  can  hardly  be  computed  for  real 
applications  in  the  nearest  future.  As  a  consequence,  the 
only  way  to  compare  localization  techniques  is  to  esti¬ 
mate  their  forecast  skill  and  computational  efficiency 
within  the  real  DA  problems.  Presented  results  give  only 
an  indication  that  further  studies  of  the  DL  methods  are 
worth  pursuing  as  they  seem  to  be  competitive  with 
other  localization  techniques.  A  definite  answer  could 
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be  given  only  by  the  aforementioned  experiments  with 
real  ensembles,  which  will  be  the  subject  of  our  future 
research. 
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APPENDIX  A 

Differentiation  of  the  Correlation  Functions 

To  simplify  the  notations,  denote  derivatives  of  a  cor¬ 
relation  function  C(r )  by  the  subscript  r  and  the  inverse 
of  the  diffusion  tensor  by  Rap.  The  second  derivative  of 
a  correlation  function  C(r)  is 

VoV„C(r)  =  C„(VarXV)  +  Cr\\pr.  (Al) 

Taking  the  first  and  second  derivatives  of  (11)  under  the 
assumption  of  local  homogeneity  yields 

W  =  X*-r<*]*  (A2> 

where 


is  bounded  at  r  — ►  0. 

After  substituting  (A2)  into  (Al)  and  rearranging  the 
terms,  (Al)  takes  the  form: 

=  yRafi  +  (c„  -  rafj .  (A3) 

Substitution  of  the  expression  in  the  rhs  of  (9)  into  (A3) 
and  taking  the  limit  r  — ►  0  yields  (12).  Similar  operation 
with  the  rhs  of  (10)  shows  that  the  second  term  in  the  rhs 
of  (A3)  is  zero  if  n  >  2,  whereas  the  first  term  is  equal  to 
nl(2  —  n).  Note  that  constraint  n  >  2  is  imposed  by  the 
condition  of  differentiability  of  the  correlation  function 
(10)  at  r  =  0. 

More  generally,  by  using  Fourier  representation  of 
the  covariance  function  [e.g.,  Eq.  (11)  in  Yaremchuk 
and  Smith  (2011)]  it  is  easy  to  show  that  the  relationship 


holds  for  arbitrary  correlation  functions  twice  differen¬ 
tiable  at  r  =  0  and  satisfying  the  local  homogeneity 
condition.  Therefore,  the  differential  method  that  is  based 
on  the  relationship 

^=(|™oC/r),V«V  (A5) 

could  be  applied  to  a  much  broader  class  of  correlation 
models  than  those  described  by  (6)-(7). 

APPENDIX  B 

Estimating  Second  Derivatives  of  the  Correlation 
Function  from  Ensemble  Perturbations 

By  definition,  the  tensor  of  second  derivatives  of  the 
BEC  matrix  B(x,  y)  ®  can  be  represented  in  two 
ways: 

Wfity  =  <(V*x,)(V£xy)>  =  V£V'(V,C,yVy) ,  (Bl) 

where  bold  italicized  superscripts  denote  the  variables  of 
differentiation  and  the  subscripts  enumerate  the  corre¬ 
sponding  coordinates  in  physical  space.  The  rhs  of  (Bl) 
can  be  rewritten  as 

VWVA,V,)  =  (KV(%Vc*y 

+  Vv(V*aVx).V£Cry 

+  v,(v'vy)vjcry  +  v,vyv£v'cxy 

(B2) 

Taking  the  value  of  (B2)  at  the  diagonal  (x  =  y)  under 
the  assumption  of  local  homogeneity  Cxy  =  Cx~y  im¬ 
plies  that  y  =  x  in  all  the  expressions  involving  V  and 
its  derivatives  and  -y  =  =  -[VaV^C]. 

Assuming  that  the  correlation  function  is  differentiable 
at  r  =  0  also  implies  that  its  gradients  at  r  =  0  are  zero 
and,  therefore,  two  middle  terms  in  the  rhs  of  (B2) 
vanish.  After  taking  into  account  the  right  equality  in 
(Bl)  and  the  definition  =  1,  (B2)  transforms  into 

<(V)°(V»  =  <VMV>  ■  VOV°[VaV/3C].  (B3) 

which  yields  (14)  after  rearrangement  of  the  terms. 
APPENDIX  C 

Diffusion  Tensor  Model 

Numerically,  the  diffusion  operator  is  defined  by 

D  =  (pV)t(i»V)  , 
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where  V  is  the  first-order  finite-difference  representa¬ 
tion  of  the  gradient  in  2D  and  v  is  the  square  root  of  the 
local  diffusion  tensor  (vTv  =  D)  represented  by 

“VC°S1'  Si"TVo-  (CD 

\0  1/  \-siny  cosy/  0 

Here  a0  —  15  km  is  the  background  decorrelation  scale, 
aa0  is  the  square  root  of  the  larger  eigenvalue  of  D, 
and  y  is  the  direction  of  the  eigenvector,  corre¬ 
sponding  to  this  eigenvalue.  The  larger  principal  axis 
of  D  is  aligned  along  the  depth  h(x ,  y)  contours  and 
its  magnitude  is  proportional  to  the  bottom  slope 
s  =  (hi  +  hyY  l.  Specifically,  the  parameters  a  and  y 
are  defined  by 


a  =  6(s  -  sc)(^s/7c  -  l)  +1, 

(C2) 

y  =  0(s-  sc )  tan_1(-/i,//iy), 

(C3) 

where  6  stands  for  the  step  function.  With  this  definition, 
the  diffusion  is  isotropic  (v  =  QqI)  when  the  slope  is 
below  the  critical  value  sc ,  which  is  chosen  to  be  s  = 
0.0003.  In  this  case,  only  20%  of  points  in  the  domain 
were  characterized  by  isotropic  diffusion. 
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